Collection of Internet

home *** CD-ROM | disk | FTP | other *** search

/ Collection of Internet / Collection of Internet.iso / infosrvr / dev / www_talk.930 / 000561_connolly@pixel.convex.com _Thu Jan 14 01:56:19 1993.msg < prev next >

Wrap

Internet Message Format | 1994-01-24 | 8KB

Return-Path: <connolly@pixel.convex.com> Received: from dxmint.cern.ch by nxoc01.cern.ch (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0) id AA08306; Thu, 14 Jan 93 01:56:19 MET Received: by dxmint.cern.ch (5.65/DEC-Ultrix/4.3) id AA18151; Thu, 14 Jan 1993 02:11:29 +0100 Received: from pixel.convex.com by convex.convex.com (5.64/1.35) id AA17773; Wed, 13 Jan 93 19:11:17 -0600 Received: from localhost by pixel.convex.com (5.64/1.28) id AA21994; Wed, 13 Jan 93 19:11:15 -0600 Message-Id: <9301140111.AA21994@pixel.convex.com> To: www-talk@nxoc01.cern.ch Subject: suggested libWWW architecture Date: Wed, 13 Jan 93 19:11:15 CST From: Dan Connolly <connolly@pixel.convex.com> I sent this to tim a while ago, but I don't think he's had time to look at it. Meanwhile, libWWW is becomming reentrant, but I still think the architecture is kinda clumsy: you have to have a big data structure describing the DTD, and a routine for each element, etc. This doesn't mesh well with the MidasWWW architecture, which can read the DTD from the X resource database at runtime. I have an idea for an architecture that the linemode and MidasWWW could share (along with other new implementations). It's not radically different from the current libWWW, but there's a lot of grunt-work between the current libWWW and what I've got here. But I think the end result would be much more usable. We start with the HText class. In stead of the various style and append methods, we have four methods in a virtual function table: typedef struct{ int (*start_tag) PARAMS((SGML_Object this, CONST char* gi, CONST char** attributes, int nattrs)); VOID (*end_tag) PARAMS((SGML_Object this, CONST char* gi)); VOID (*entity) PARAMS((SGML_Object this, CONST char* name)); VOID (*data) PARAMS((SGML_Object this, CONST char* data, int char_qty)); }SGML_DocClass; The linemode would declare something like: SGML_DocClass griddoc = {HText_start_tag, HText_end_tag, HText_entity, HText_data}; The HText implementation is responsible for keeping track of the stack of open elements, if it needs to. On top of these we build some format parsing routines: SGML_parse(void* dest, void* closure, void* stream, int (getc)(void*)); /* psuedocode: int read, content; char buffer[1000]; SGML_DocClass *docclass = (SGML_DocClass*)closure; while( (read = SGML_read(buffer, content, stream, getc)) != EOF){ switch(read){ case SGML_start_tag: ... parse name, attributes ... content = (docclass->startTag)(dest, name, attrs); if(content = empty){ (docclass->endTag)(name); content = MIXED; /*@@ could be ELEMENT */ } break; case SGML_end_tag: ... parse name ... (docclass->endTag)(name); content = MIXED; /*@@ could be ELEMENT */ break; case SGML_entity: (docclass->entity)(data, name); break; default: (docclass->data)(dest, buffer); } */ PlainText_parse(HText* dest, void* docclass, void* stream, int (getc)(void*)); /* psuedocode: (docclass->startTag)(dest, "HTML"); (docclass->startTag)(dest, "BODY"); (docclass->startTag)(dest, "PRE"); keep a local buffer of about 1000 chars. Call (getc)(stream) until EOF. Call HText_data(dest, buffer) whenever buffer is full. (docclass->endTag)(dest, "PRE"); (docclass->endTag)(dest, "BODY"); (docclass->endTag)(dest, "HTML"); */ GopherListing_parse(HText* dest, void* dummy, void* stream, int (getc)(void*)); /* psuedocode: (docclass->startTag)(dest, "HTML"); (docclass->startTag)(dest, "BODY"); (docclass->startTag)(dest, "MENU"); while(Gopher_parse_line(stream, getc, type, name, host, port, path)){ char addr[BIG]; sprintf(addr, "gopher://%s:%d/%c%s", host, port, type, path); (docclass->startTag)(dest, "A", "HREF", addr, 0); (docclass->data)(dest, name); (docclass->endTag)(dest, "A"); } (docclass->endTag)(dest, "MENU"); (docclass->endTag)(dest, "BODY"); (docclass->endTag)(dest, "HTML"); */ We register each of these with the following routine: int ContentType_register(CONST char* type, CONST char* subtype, HTParseProc parse, void* closure); For example: main() { ContentType_register("TEXT", "X-HTML", HTML_parse, griddoc); ContentType_register("TEXT", "PLAIN", PlainText_parse, griddoc); ContentType_register("APPLICATION", "X-GOPHER", GopherListing_parse, griddoc); } The following routine can be used for any MIME entity. It will dispatch the appropriate parsing routine based on the content type header: int ContentType_parse(const char* ct, HText* dest, void* stream, int (getc)(void*)); Then we build some load routines, one per access scheme: (note that this design separates format from the access scheme, which allows us to, for example, load a gopher menu from a local file, or load HTML text from a Gopher server) /* I don't have error handling worked out yet. We need to have a coherent design for this. It's a mess in the current WWWlib. */ /* I think the WWW file: should be split into ftp: and local-file:. It's cleaner to implement; there are precedents in the MidasWWW local: scheme and the MIME ftp and local-file access-types. */ int LocalFile_load(HText* dest, CONST char* path, CONST char* search) { FILE* stream; if(stream = fopen(path)){ const char* content_type = WWW_zen_content_type_from_extension(path); ContentType_parse(content_type, dest, (void*)stream, (int ()(void*))getc); fclose(stream); return 1; }else{ /* log an error */ return 0; } } int FTP_load(HText* dest, CONST char* path, CONST char* search); int HTTP_load(HText* dest, CONST char* path, CONST char* search); int Gopher_load(HText* dest, CONST char* path, CONST char* search); { const char* content_type = Gopher_zen_content_type_from_gtype_char(*path); char* host = HTParse(path, PARSE_HOST); char* portnum = HTParse(path, PARSE_PORT); int port = atoi(portnum); static char* tab = "\007"; static char* crlf = "\015\012"; void* stream = TCPOpen(host, port); if(stream){ TCPwrite(stream, path, strlen(path); if(search){ TCPwrite(stream, tab, 1); TCPwrite(stream, search, strlen(search); } TCPwrite(stream, crlf, 2); ContentType_parse(content_type, dest, stream, TCPgetc); TCPclose(stream); return 1; }else{ /* log an error */ return 0; } } Then we register these just like formats: HTAccess_register(const char* name, HTLoadProc load, void* closure); And the HTLoadDocument routine in HTAccess.c becomes this: int HTAccess_load(HText* dest, HTParentAnchor* p, CONST char* address) { char* scheme = HTParse(address, PARSE_SCHEME); /* path is everything after the colon, except the anchor */ char* path = HTParse(address, PARSE_HOST|PARSE_PORT|PARSE_PATH); char* anchor = HTParse(address, PARSE_ANCHOR); char* search = HTParse(address, PARSE_SEARCH_TERMS); HText dest = HText_new(p); /* check for doc already loaded in p @@ */ void* closure; HTLoadProc load; if(load = /* load routine registered for scheme. find closure too */){ (load)(dest, path, search, closure); } HTSelect(dest, anchor); } What do you think? Dan